Skip to content

Conversation

fhahn
Copy link
Contributor

@fhahn fhahn commented Aug 27, 2025

Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on #155102 (included in
PR)

@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-backend-risc-v

Author: Florian Hahn (fhahn)

Changes

Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on #155102 (included in
PR)


Patch is 436.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/155670.diff

71 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlan.cpp (+5-21)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+10-12)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+1)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+11-26)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+71-13)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.h (+2-2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+63-35)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUtils.h (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/divs-with-scalable-vfs.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/drop-poison-generating-flags.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/epilog-vectorization-widen-inductions.ll (+6-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/first-order-recurrence-fold-tail.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/interleave-with-gaps.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/pr60831-sve-inv-store-crash.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-interleaved-accesses.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/transform-narrow-interleave-to-widen-memory-with-wide-ops.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/type-shrinkage-insertelt.ll (+48-48)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/widen-call-with-intrinsic-or-libfunc.ll (+2-6)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/wider-VF-for-callinst.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/consecutive-ptr-uniforms.ll (+13-14)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/drop-poison-generating-flags.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gather_scatter.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/gep-use-outside-loop.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/induction-costs.ll (+15-15)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleave-ptradd-with-replicated-operand.ll (+16-16)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-hoist-load-across-store.ll (+36-36)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/interleaved-accesses-sink-store-across-load.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/parallel-loops.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/replicate-recipe-with-only-first-lane-used.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/strided_load_cost.ll (+39-39)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/vplan-native-inner-loop-only.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/widened-value-used-as-scalar-and-first-lane.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-accesses-masked-group.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/x86-interleaved-store-accesses-with-gaps.ll (+14-14)
  • (modified) llvm/test/Transforms/LoopVectorize/assume.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/bsd_regex.ll (+7-7)
  • (modified) llvm/test/Transforms/LoopVectorize/dereferenceable-info-from-assumption-constant-size.ll (+10-12)
  • (modified) llvm/test/Transforms/LoopVectorize/forked-pointers.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/histograms.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/induction.ll (+78-78)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses-pred-stores.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/interleaved-accesses.ll (+34-34)
  • (modified) llvm/test/Transforms/LoopVectorize/load-deref-pred-align.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/loop-scalars.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/metadata.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/optimal-epilog-vectorization.ll (+8-8)
  • (modified) llvm/test/Transforms/LoopVectorize/optsize.ll (+5-5)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction-index-width-smaller-than-iv-width.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/pointer-induction.ll (+8-7)
  • (modified) llvm/test/Transforms/LoopVectorize/pr39417-optsize-scevchecks.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/predicate-switch.ll (+16-10)
  • (modified) llvm/test/Transforms/LoopVectorize/preserve-dbg-loc-and-loop-metadata.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/reduction-with-invariant-store.ll (+12-12)
  • (modified) llvm/test/Transforms/LoopVectorize/reuse-lcssa-phi-scev-expansion.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/scalable-assume.ll (+2-2)
  • (modified) llvm/test/Transforms/LoopVectorize/scev-predicate-reasoning.ll (+9-9)
  • (modified) llvm/test/Transforms/LoopVectorize/single-value-blend-phis.ll (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/struct-return-replicate.ll (+33-33)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform-args-call-variants.ll (+4-4)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform-blend.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll (+120-120)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_and.ll (+68-68)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_div_urem.ll (+66-66)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1_lshr.ll (+220-220)
  • (modified) llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll (+464-464)
  • (modified) llvm/test/Transforms/LoopVectorize/version-stride-with-integer-casts.ll (+3-3)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-native-path-inner-loop-with-runtime-checks.ll (+10-10)
  • (modified) llvm/test/Transforms/LoopVectorize/vplan-predicate-switch.ll (+8-4)
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.cpp b/llvm/lib/Transforms/Vectorize/VPlan.cpp
index 1438dc366b55d..1ada5e413bd9e 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlan.cpp
@@ -333,6 +333,9 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
     LastLane = 0;
   }
 
+  assert(IsSingleScalar && "must be a single-scalar at this point");
+  // We need to construct the vector value for a single-scalar value by
+  // broadcasting the scalar to all lanes.
   auto *LastInst = cast<Instruction>(get(Def, LastLane));
   // Set the insert point after the last scalarized instruction or after the
   // last PHI, if LastInst is a PHI. This ensures the insertelement sequence
@@ -343,27 +346,8 @@ Value *VPTransformState::get(const VPValue *Def, bool NeedsScalar) {
                    : std::next(BasicBlock::iterator(LastInst));
   Builder.SetInsertPoint(&*NewIP);
 
-  // However, if we are vectorizing, we need to construct the vector values.
-  // If the value is known to be uniform after vectorization, we can just
-  // broadcast the scalar value corresponding to lane zero. Otherwise, we
-  // construct the vector values using insertelement instructions. Since the
-  // resulting vectors are stored in State, we will only generate the
-  // insertelements once.
-  Value *VectorValue = nullptr;
-  if (IsSingleScalar) {
-    VectorValue = GetBroadcastInstrs(ScalarValue);
-    set(Def, VectorValue);
-  } else {
-    assert(!VF.isScalable() && "VF is assumed to be non scalable.");
-    assert(isa<VPInstruction>(Def) &&
-           "Explicit BuildVector recipes must have"
-           "handled packing for non-VPInstructions.");
-    // Initialize packing with insertelements to start from poison.
-    VectorValue = PoisonValue::get(toVectorizedTy(LastInst->getType(), VF));
-    for (unsigned Lane = 0; Lane < VF.getFixedValue(); ++Lane)
-      VectorValue = packScalarIntoVectorizedValue(Def, VectorValue, Lane);
-    set(Def, VectorValue);
-  }
+  Value *VectorValue = GetBroadcastInstrs(ScalarValue);
+  set(Def, VectorValue);
   Builder.restoreIP(OldIP);
   return VectorValue;
 }
diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.h
index 33bcb49b81740..b0ada25c6fe67 100644
--- a/llvm/lib/Transforms/Vectorize/VPlan.h
+++ b/llvm/lib/Transforms/Vectorize/VPlan.h
@@ -900,6 +900,8 @@ struct VPRecipeWithIRFlags : public VPSingleDefRecipe, public VPIRFlags {
     return R && classof(R);
   }
 
+  virtual VPRecipeWithIRFlags *clone() override = 0;
+
   void execute(VPTransformState &State) override = 0;
 
   /// Compute the cost for this recipe for \p VF, using \p Opcode and \p Ctx.
@@ -1042,15 +1044,9 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
     ResumeForEpilogue,
     /// Returns the value for vscale.
     VScale,
+    Unpack,
   };
 
-private:
-  typedef unsigned char OpcodeTy;
-  OpcodeTy Opcode;
-
-  /// An optional name that can be used for the generated IR instruction.
-  const std::string Name;
-
   /// Returns true if this VPInstruction generates scalar values for all lanes.
   /// Most VPInstructions generate a single value per part, either vector or
   /// scalar. VPReplicateRecipe takes care of generating multiple (scalar)
@@ -1059,6 +1055,13 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
   /// underlying ingredient.
   bool doesGeneratePerAllLanes() const;
 
+private:
+  typedef unsigned char OpcodeTy;
+  OpcodeTy Opcode;
+
+  /// An optional name that can be used for the generated IR instruction.
+  const std::string Name;
+
   /// Returns true if we can generate a scalar for the first lane only if
   /// needed.
   bool canGenerateScalarForFirstLane() const;
@@ -1068,11 +1071,6 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,
   /// existing value is returned rather than a generated one.
   Value *generate(VPTransformState &State);
 
-  /// Utility methods serving execute(): generates a scalar single instance of
-  /// the modeled instruction for a given lane. \returns the scalar generated
-  /// value for lane \p Lane.
-  Value *generatePerLane(VPTransformState &State, const VPLane &Lane);
-
 #if !defined(NDEBUG)
   /// Return the number of operands determined by the opcode of the
   /// VPInstruction. Returns -1u if the number of operands cannot be determined
diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
index 747c6623aa22a..75d23f5103503 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp
@@ -109,6 +109,7 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {
   case VPInstruction::AnyOf:
   case VPInstruction::BuildStructVector:
   case VPInstruction::BuildVector:
+  case VPInstruction::Unpack:
     return SetResultTyFromOp();
   case VPInstruction::ExtractLane:
     return inferScalarType(R->getOperand(1));
diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
index 86834ab1240c1..cf6e2360d76bd 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp
@@ -467,6 +467,7 @@ unsigned VPInstruction::getNumOperandsForOpcode(unsigned Opcode) {
   case VPInstruction::ExtractPenultimateElement:
   case VPInstruction::FirstActiveLane:
   case VPInstruction::Not:
+  case VPInstruction::Unpack:
     return 1;
   case Instruction::ICmp:
   case Instruction::FCmp:
@@ -525,16 +526,6 @@ bool VPInstruction::canGenerateScalarForFirstLane() const {
   }
 }
 
-Value *VPInstruction::generatePerLane(VPTransformState &State,
-                                      const VPLane &Lane) {
-  IRBuilderBase &Builder = State.Builder;
-
-  assert(getOpcode() == VPInstruction::PtrAdd &&
-         "only PtrAdd opcodes are supported for now");
-  return Builder.CreatePtrAdd(State.get(getOperand(0), Lane),
-                              State.get(getOperand(1), Lane), Name);
-}
-
 /// Create a conditional branch using \p Cond branching to the successors of \p
 /// VPBB. Note that the first successor is always forward (i.e. not created yet)
 /// while the second successor may already have been created (if it is a header
@@ -1154,24 +1145,13 @@ void VPInstruction::execute(VPTransformState &State) {
          "Set flags not supported for the provided opcode");
   if (hasFastMathFlags())
     State.Builder.setFastMathFlags(getFastMathFlags());
-  bool GeneratesPerFirstLaneOnly = canGenerateScalarForFirstLane() &&
-                                   (vputils::onlyFirstLaneUsed(this) ||
-                                    isVectorToScalar() || isSingleScalar());
-  bool GeneratesPerAllLanes = doesGeneratePerAllLanes();
-  if (GeneratesPerAllLanes) {
-    for (unsigned Lane = 0, NumLanes = State.VF.getFixedValue();
-         Lane != NumLanes; ++Lane) {
-      Value *GeneratedValue = generatePerLane(State, VPLane(Lane));
-      assert(GeneratedValue && "generatePerLane must produce a value");
-      State.set(this, GeneratedValue, VPLane(Lane));
-    }
-    return;
-  }
-
   Value *GeneratedValue = generate(State);
   if (!hasResult())
     return;
   assert(GeneratedValue && "generate must produce a value");
+  bool GeneratesPerFirstLaneOnly = canGenerateScalarForFirstLane() &&
+                                   (vputils::onlyFirstLaneUsed(this) ||
+                                    isVectorToScalar() || isSingleScalar());
   assert((((GeneratedValue->getType()->isVectorTy() ||
             GeneratedValue->getType()->isStructTy()) ==
            !GeneratesPerFirstLaneOnly) ||
@@ -1209,6 +1189,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {
   case VPInstruction::StepVector:
   case VPInstruction::ReductionStartVector:
   case VPInstruction::VScale:
+  case VPInstruction::Unpack:
     return false;
   default:
     return true;
@@ -1244,10 +1225,11 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {
   case VPInstruction::Broadcast:
   case VPInstruction::ReductionStartVector:
     return true;
+  case VPInstruction::BuildStructVector:
+  case VPInstruction::BuildVector:
+    return getNumOperands() > 1;
   case VPInstruction::PtrAdd:
     return Op == getOperand(0) || vputils::onlyFirstLaneUsed(this);
-  case VPInstruction::WidePtrAdd:
-    return Op == getOperand(0);
   case VPInstruction::ComputeAnyOfResult:
   case VPInstruction::ComputeFindIVResult:
     return Op == getOperand(1);
@@ -1371,6 +1353,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,
   case VPInstruction::ResumeForEpilogue:
     O << "resume-for-epilogue";
     break;
+  case VPInstruction::Unpack:
+    O << "unpack-into-scalars";
+    break;
   default:
     O << Instruction::getOpcodeName(getOpcode());
   }
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index d32d2a9ad11f7..110f127d0d741 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1189,6 +1189,15 @@ static void simplifyRecipe(VPRecipeBase &R, VPTypeAnalysis &TypeInfo) {
     return;
   }
 
+  VPValue *Idx;
+  if (match(&R, m_VPInstruction<Instruction::ExtractElement>(m_BuildVector(),
+                                                             m_VPValue(Idx)))) {
+    auto *BuildVector = cast<VPInstruction>(R.getOperand(0));
+    Def->replaceAllUsesWith(BuildVector->getOperand(
+        dyn_cast<ConstantInt>(Idx->getLiveInIRValue())->getZExtValue()));
+    return;
+  }
+
   if (auto *Phi = dyn_cast<VPPhi>(Def)) {
     if (Phi->getNumOperands() == 1)
       Phi->replaceAllUsesWith(Phi->getOperand(0));
@@ -3370,40 +3379,89 @@ void VPlanTransforms::materializeBuildVectors(VPlan &Plan) {
       vp_depth_first_shallow(Plan.getEntry()));
   auto VPBBsInsideLoopRegion = VPBlockUtils::blocksOnly<VPBasicBlock>(
       vp_depth_first_shallow(LoopRegion->getEntry()));
-  // Materialize Build(Struct)Vector for all replicating VPReplicateRecipes,
-  // excluding ones in replicate regions. Those are not materialized explicitly
-  // yet. Those vector users are still handled in VPReplicateRegion::execute(),
-  // via shouldPack().
+  // Materialize Build(Struct)Vector for all replicating VPReplicateRecipes and
+  // VPInstructions, excluding ones in replicate regions. Those are not
+  // materialized explicitly yet. Those vector users are still handled in
+  // VPReplicateRegion::execute(), via shouldPack().
   // TODO: materialize build vectors for replicating recipes in replicating
   // regions.
   // TODO: materialize build vectors for VPInstructions.
   for (VPBasicBlock *VPBB :
        concat<VPBasicBlock *>(VPBBsOutsideLoopRegion, VPBBsInsideLoopRegion)) {
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
-      auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
-      auto UsesVectorOrInsideReplicateRegion = [RepR, LoopRegion](VPUser *U) {
+      auto *DefR = dyn_cast<VPRecipeWithIRFlags>(&R);
+      if (!DefR)
+        continue;
+      if (!isa<VPReplicateRecipe, VPInstruction>(DefR))
+        continue;
+      auto UsesVectorOrInsideReplicateRegion = [DefR, LoopRegion](VPUser *U) {
         VPRegionBlock *ParentRegion =
             cast<VPRecipeBase>(U)->getParent()->getParent();
-        return !U->usesScalars(RepR) || ParentRegion != LoopRegion;
+        return !U->usesScalars(DefR) || ParentRegion != LoopRegion;
       };
-      if (!RepR || RepR->isSingleScalar() ||
-          none_of(RepR->users(), UsesVectorOrInsideReplicateRegion))
+      if ((isa<VPReplicateRecipe>(DefR) &&
+           cast<VPReplicateRecipe>(DefR)->isSingleScalar()) ||
+          (isa<VPInstruction>(DefR) &&
+           !cast<VPInstruction>(DefR)->doesGeneratePerAllLanes()) ||
+          vputils::onlyFirstLaneUsed(DefR) ||
+          none_of(DefR->users(), UsesVectorOrInsideReplicateRegion))
         continue;
 
-      Type *ScalarTy = TypeInfo.inferScalarType(RepR);
+      Type *ScalarTy = TypeInfo.inferScalarType(DefR);
       unsigned Opcode = ScalarTy->isStructTy()
                             ? VPInstruction::BuildStructVector
                             : VPInstruction::BuildVector;
-      auto *BuildVector = new VPInstruction(Opcode, {RepR});
-      BuildVector->insertAfter(RepR);
+      auto *BuildVector = new VPInstruction(Opcode, {DefR});
+      BuildVector->insertAfter(DefR);
 
-      RepR->replaceUsesWithIf(
+      DefR->replaceUsesWithIf(
           BuildVector, [BuildVector, &UsesVectorOrInsideReplicateRegion](
                            VPUser &U, unsigned) {
             return &U != BuildVector && UsesVectorOrInsideReplicateRegion(&U);
           });
     }
   }
+
+  // Create explicit VPInstructions to convert vectors to scalars.
+  for (VPBasicBlock *VPBB :
+       concat<VPBasicBlock *>(VPBBsOutsideLoopRegion, VPBBsInsideLoopRegion)) {
+    for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
+      if (isa<VPReplicateRecipe, VPInstruction, VPScalarIVStepsRecipe>(&R))
+        continue;
+      for (VPValue *Def : R.definedValues()) {
+        if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def))
+          continue;
+
+        if (VPBB->getParent() != Plan.getVectorLoopRegion())
+          continue;
+
+        auto UsesVectorOrInsideReplicateRegion = [LoopRegion](VPUser *U) {
+          VPRegionBlock *ParentRegion =
+              cast<VPRecipeBase>(U)->getParent()->getParent();
+          return ParentRegion && ParentRegion != LoopRegion;
+        };
+
+        if (none_of(Def->users(),
+                    [Def, &UsesVectorOrInsideReplicateRegion](VPUser *U) {
+                      return !UsesVectorOrInsideReplicateRegion(U) &&
+                             U->usesScalars(Def);
+                    }))
+          continue;
+
+        auto *Unpack = new VPInstruction(VPInstruction::Unpack, {Def});
+        if (R.isPhi())
+          Unpack->insertBefore(*VPBB, VPBB->getFirstNonPhi());
+        else
+          Unpack->insertAfter(&R);
+        Def->replaceUsesWithIf(
+            Unpack,
+            [Def, &UsesVectorOrInsideReplicateRegion](VPUser &U, unsigned) {
+              return !UsesVectorOrInsideReplicateRegion(&U) &&
+                     U.usesScalars(Def);
+            });
+      }
+    }
+  }
 }
 
 void VPlanTransforms::materializeVectorTripCount(VPlan &Plan,
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
index 9cf62a35ae36b..bd4b94fa034c4 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.h
@@ -158,8 +158,8 @@ struct VPlanTransforms {
   /// Explicitly unroll \p Plan by \p UF.
   static void unrollByUF(VPlan &Plan, unsigned UF);
 
-  /// Replace each VPReplicateRecipe outside on any replicate region in \p Plan
-  /// with \p VF single-scalar recipes.
+  /// Replace each VPReplicateRecipe and replicating VPInstruction outside on
+  /// any replicate region in \p Plan with \p VF single-scalar recipes.
   /// TODO: Also replicate VPReplicateRecipes inside replicate regions, thereby
   /// dissolving the latter.
   static void replicateByVF(VPlan &Plan, ElementCount VF);
diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
index 7a63d20825a31..9155335f9dde6 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp
@@ -463,15 +463,34 @@ void VPlanTransforms::unrollByUF(VPlan &Plan, unsigned UF) {
   VPlanTransforms::removeDeadRecipes(Plan);
 }
 
-/// Create a single-scalar clone of \p RepR for lane \p Lane. Use \p
-/// Def2LaneDefs to look up scalar definitions for operands of \RepR.
-static VPReplicateRecipe *
+/// Create a single-scalar clone of \p DefR for lane \p Lane. Use \p
+/// Def2LaneDefs to look up scalar definitions for operands of \DefR.
+static VPValue *
 cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
-             VPReplicateRecipe *RepR, VPLane Lane,
+             VPRecipeWithIRFlags *DefR, VPLane Lane,
              const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) {
+
+  VPValue *Op;
+  if (match(DefR, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op)))) {
+    auto LaneDefs = Def2LaneDefs.find(Op);
+    if (LaneDefs != Def2LaneDefs.end())
+      return LaneDefs->second[Lane.getKnownLane()];
+
+    VPValue *Idx =
+        Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane()));
+    return Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx});
+  }
+
   // Collect the operands at Lane, creating extracts as needed.
   SmallVector<VPValue *> NewOps;
-  for (VPValue *Op : RepR->operands()) {
+  for (VPValue *Op : DefR->operands()) {
+    if (Lane.getKind() == VPLane::Kind::ScalableLast) {
+      match(Op, m_VPInstruction<VPInstruction::Unpack>(m_VPValue(Op)));
+      NewOps.push_back(
+          Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op}));
+      continue;
+    }
+
     // If Op is a definition that has been unrolled, directly use the clone for
     // the corresponding lane.
     auto LaneDefs = Def2LaneDefs.find(Op);
@@ -479,11 +498,6 @@ cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
       NewOps.push_back(LaneDefs->second[Lane.getKnownLane()]);
       continue;
     }
-    if (Lane.getKind() == VPLane::Kind::ScalableLast) {
-      NewOps.push_back(
-          Builder.createNaryOp(VPInstruction::ExtractLastElement, {Op}));
-      continue;
-    }
     if (vputils::isSingleScalar(Op)) {
       NewOps.push_back(Op);
       continue;
@@ -497,15 +511,23 @@ cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
     }
     VPValue *Idx =
         Plan.getOrAddLiveIn(ConstantInt::get(IdxTy, Lane.getKnownLane()));
-    VPValue *Ext = Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx});
-    NewOps.push_back(Ext);
+    NewOps.push_back(
+        Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx}));
   }
 
-  auto *New =
-      new VPReplicateRecipe(RepR->getUnderlyingInstr(), NewOps,
-                            /*IsSingleScalar=*/true, /*Mask=*/nullptr, *RepR);
-  New->transferFlags(*RepR);
-  New->insertBefore(RepR);
+  VPRecipeWithIRFlags *New;
+  if (auto *RepR = dyn_cast<VPReplicateRecipe>(DefR)) {
+    New =
+        new VPReplicateRecipe(RepR->getUnderlyingInstr(), NewOps,
+                              /*IsSingleScalar=*/true, /*Mask=*/nullptr, *RepR);
+  } else {
+    New = DefR->clone();
+    for (const auto &[Idx, Op] : enumerate(NewOps)) {
+      New->setOperand(Idx, Op);
+    }
+  }
+  New->transferFlags(*DefR);
+  New->insertBefore(DefR);
   return New;
 }
 
@@ -530,41 +552,47 @@ void VPlanTransforms::replicateByVF(VPlan &Plan, ElementCount VF) {
   SmallVector<VPRecipeBase *> ToRemove;
   for (VPBasicBlock *VPBB : VPBBsToUnroll) {
     for (VPRecipeBase &R : make_early_inc_range(*VPBB)) {
-      auto *RepR = dyn_cast<VPReplicateRecipe>(&R);
-      if (!RepR || RepR->isSingleScalar())
+      auto *DefR = dyn_cast<VPRecipeWithIRFlags>(&R);
+      if (!DefR || !isa<VPInstruction, VPReplicateRecipe>(DefR))
+        continue;
+      if ((isa<VPReplicateRecipe>(DefR) &&
+           cast<VPReplicateRecipe>(DefR)->isSingleScalar()) ||
+          (isa<VPInstruction>(DefR) &&
+           !cast<VPInstruction>(DefR)->doesGeneratePerAllLanes() &&
+           cast<VPInstruction>(DefR)->getOpcode() != VPInstruction::Unpack))
         continue;
 
-      VPBuilder Builder(RepR);
-      if (RepR->getNumUsers() == 0) {
-        if (isa<StoreInst>(RepR->getUnderlyingInstr()) &&
-            vputils::isSingleScalar(RepR->getOperand(1))) {
+      VPBuilder Builder(DefR);
+      if (DefR->getNumUsers() == 0) {
+        if (isa<StoreInst>(DefR->getUnderlyingInstr()) &&
+            vputils::isSingleScalar(DefR->getOperand(1))) {
           // Stores to invariant addresses need to store the last lane only.
-          cloneForLane(Plan, Builder, IdxTy, RepR, VPLane::getLastLaneForVF(VF),
+          cloneForLane(Plan, Builder, IdxTy, DefR, VPLane::getLastLaneForVF(VF),
                        Def2LaneDefs);
         } else {
-          // Create single-scalar version of RepR for all lanes.
+          // Create single-scalar version of DefR for all lanes.
           for (unsigned I = 0; I != VF.getKnownMinValue(); ++I)
-         ...
[truncated]

Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused.

Comment on lines 1231 to 1288
case VPInstruction::WidePtrAdd:
return Op == getOperand(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current code here doesn't match the codegen for WidePtrAdd which is exposed by this patch; it only uses the first lane, if the pointer is a single scalar.

With this code, we would incorrectly insert an unpack and only provide the first lane instead of the vector pointers that is needed.

Add a new Unpack VPInstruction (name to be improved) to explicitly
extract scalars values from vectors.

Test changes are movements of the extracts: they are no generated
together and also directly after the producer.

Depends on llvm#155102 (included in
PR)

	modified:   llvm/test/Transforms/LoopVectorize/RISCV/riscv-vector-reverse.ll
@fhahn fhahn force-pushed the vplan-explicit-unpack branch from b6198b4 to b49efbc Compare September 15, 2025 17:55
}

VPValue *Idx;
if (match(&R, m_VPInstruction<Instruction::ExtractElement>(m_BuildVector(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to add an m_ExtractElement?

m_VPValue(Idx)))) {
auto *BuildVector = cast<VPInstruction>(R.getOperand(0));
Def->replaceAllUsesWith(BuildVector->getOperand(
cast<ConstantInt>(Idx->getLiveInIRValue())->getZExtValue()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be good to replace this with m_ConstantInt, once the patch I put up is merged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def))
continue;

if (VPBB->getParent() != Plan.getVectorLoopRegion())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (VPBB->getParent() != Plan.getVectorLoopRegion())
if (VPBB->getParent() != LoopRegion)

Can this ever happen? I thought we were doing a shallow traversal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This skipped VPInstructions outside regions, I updated the traversal to not visit those.

if (VPBB->getParent() != Plan.getVectorLoopRegion())
continue;

auto UsesVectorOrInsideReplicateRegion = [LoopRegion](VPUser *U) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto UsesVectorOrInsideReplicateRegion = [LoopRegion](VPUser *U) {
auto IsInsideReplicateRegion = [LoopRegion](VPUser *U) {

Where does it check that it uses vector?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that needs to be done separately in the latest version, renamed, thanks

if (none_of(Def->users(),
[Def, &UsesVectorOrInsideReplicateRegion](VPUser *U) {
return !UsesVectorOrInsideReplicateRegion(U) &&
U->usesScalars(Def);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick note: usesScalars can be a bit weak; I put up chainUsesScalarValues to fix this, but I'm not sure if it's the best idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this use case, it should be fine, we just need to know if the recipe may use scalars and provide it

Copy link
Contributor

@artagnon artagnon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any additional issues, but I'm afraid I'm missing context for VPlanUnroll: I've never played with the ScalableLast lane, so I'll leave this up to a more qualified review (@ayalz?). Thanks.

cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
VPRecipeWithIRFlags *DefR, VPLane Lane,
const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray newline?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed, thanks

Comment on lines 7189 to 7190
// TODO: Move to VPlan transform stage once the transition to the VPlan-based
// cost model is complete for better cost estimates.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment still relevant?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, comment indicates that these transforms are still taking place during VPlan execution rather than planning. Perhaps worth rewording to clarify.

// cost model is complete for better cost estimates.
VPlanTransforms::runPass(VPlanTransforms::unrollByUF, BestVPlan, BestUF);
VPlanTransforms::runPass(VPlanTransforms::materializeBuildVectors, BestVPlan);
VPlanTransforms::runPass(VPlanTransforms::materializeBuildAndUnpackVectors,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using Pack to denote both BuildVector and BuildStructVector, consistent with Unpack; and drop Vectors suffix, consistent with the following materializeBroadcasts()? Could fuse both into one, or split into three: pack, unpack, broadcast.

Suggested change
VPlanTransforms::runPass(VPlanTransforms::materializeBuildAndUnpackVectors,
VPlanTransforms::runPass(VPlanTransforms::materializePacksAndUnpacks,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Comment on lines 7189 to 7190
// TODO: Move to VPlan transform stage once the transition to the VPlan-based
// cost model is complete for better cost estimates.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, comment indicates that these transforms are still taking place during VPlan execution rather than planning. Perhaps worth rewording to clarify.

Comment on lines 1286 to 1288
case VPInstruction::WidePtrAdd:
return Op == getOperand(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How/Is this related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The patch exposed an issue here, more details in https://github.com/llvm/llvm-project/pull/155670/files#r2347298616 above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to false seems a bit odd, as the first operand should be a scalar base. Worth a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment. It supports both vector and scalar base addresses.

Comment on lines 1411 to 1412
case VPInstruction::UnpackVector:
O << "unpack-into-scalars";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If clearer should the enum also be UnpackIntoScalars? Although Unpack should suffice for both, with documentation explaining what the term stands for.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated to unpack


uint64_t Idx;
if (match(&R, m_ExtractElement(m_BuildVector(), m_ConstantInt(Idx)))) {
auto *BuildVector = cast<VPInstruction>(R.getOperand(0));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could m_BuildVector(Op0) provide the desired operand?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think it could, but would be inconsistent with other matchers, where the inner matcher match the operands.

cloneForLane(VPlan &Plan, VPBuilder &Builder, Type *IdxTy,
VPRecipeWithIRFlags *DefR, VPLane Lane,
const DenseMap<VPValue *, SmallVector<VPValue *>> &Def2LaneDefs) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Comment on lines 501 to 517
VPValue *Ext = Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx});
NewOps.push_back(Ext);
NewOps.push_back(
Builder.createNaryOp(Instruction::ExtractElement, {Op, Idx}));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored the original code, thanks

Comment on lines 87 to 88
if (auto *Reduce = dyn_cast<VPReductionRecipe>(VPV))
return true;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Independent, or only tested now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stripped again for now, thanks

SmallVector<VPValue *> NewOps;
for (VPValue *Op : DefR->operands()) {
if (Lane.getKind() == VPLane::Kind::ScalableLast) {
match(Op, m_VPInstruction<VPInstruction::UnpackVector>(m_VPValue(Op)));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth clarifying that Op is being redefined from an operand of DefR to its own operand.
Is it important to hoist from current location below?
Worth documenting Unpack's role in extracting ScalableLast?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment, thanks.

It can stay at the original position in the latest version, moved back, thanks

ResumeForEpilogue,
/// Returns the value for vscale.
VScale,
/// Extracts all lanes from its (non-scalable) vector operand.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an abstract VPInstruction whose single defined VPValue represents VF scalars extracted from a vector, to be replaced by VF ExtractElement VPInstructions?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added thanks

/// Extracts all lanes from its (non-scalable) vector operand. This is an
/// abstract VPInstruction whose single defined VPValue represents VF
/// scalars extracted from a vector, to be replaced by VF ExtractElement
/// VPInstructions?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// VPInstructions?
/// VPInstructions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed ,thanks

Comment on lines 1286 to 1288
case VPInstruction::WidePtrAdd:
return Op == getOperand(0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defaulting to false seems a bit odd, as the first operand should be a scalar base. Worth a comment?

auto IsInsideReplicateRegion = [LoopRegion](VPUser *U) {
VPRegionBlock *ParentRegion =
cast<VPRecipeBase>(U)->getParent()->getParent();
return ParentRegion && ParentRegion != LoopRegion;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better check if ParentRegion is replicating, consistent with the lambda's name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks!

for (VPValue *Def : R.definedValues()) {
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def))
continue;

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Better placed before the lambda?

// At the moment, we only create unpacks for scalar users outside
// replicate regions. Recipes inside replicate regions still manually
// extract the required lanes. TODO: Remove once replicate regions are
// unrolled explicitly.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// unrolled explicitly.
// unrolled completely.

They are already unrolled explicitly by UF, but have yet to also be unrolled by VF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, updated, thanks

if (isa<VPReplicateRecipe, VPInstruction, VPScalarIVStepsRecipe>(&R))
continue;
for (VPValue *Def : R.definedValues()) {
if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The onlyFirstLaneUsed case may require a single extract rather than full unpack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I added a comment that the recipes skipped here may or may not produce a vector eventually and a TODO to introduce unpacks for them as well, and remove them once we are guaranteed to produce scalars.

For that reason, I left the code as is here for now.

// extract the required lanes. TODO: Remove once replicate regions are
// unrolled explicitly.
if (none_of(Def->users(), [Def, &IsInsideReplicateRegion](VPUser *U) {
return !IsInsideReplicateRegion(U) && U->usesScalars(Def);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return !IsInsideReplicateRegion(U) && U->usesScalars(Def);
return U->usesScalars(Def) && !IsInsideReplicateRegion(U);

sounds a bit cleaner?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done thanks

Unpack->insertAfter(&R);
Def->replaceUsesWithIf(
Unpack, [Def, &IsInsideReplicateRegion](VPUser &U, unsigned) {
return !IsInsideReplicateRegion(&U) && U.usesScalars(Def);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto
Worth defining an IsCandidateUnpackUser()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, merged IsInsideReplicateRegion in new IsCandidateUnpackUser

/// Add explicit Build[Struct]Vector recipes that combine multiple scalar
/// values into single vectors.
static void materializeBuildVectors(VPlan &Plan);
/// values into single vectors and UnpackVector to extract scalars from a
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// values into single vectors and UnpackVector to extract scalars from a
/// values into single vectors and Unpack recipes to extract scalars from a

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

; CHECK-NEXT: store ptr null, ptr [[TMP10]], align 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1
; CHECK-NEXT: store ptr null, ptr [[TMP11]], align 8
; CHECK-NEXT: store ptr null, ptr [[TMP17]], align 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This store to TMP17 replaces the one removed above to TMP4?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok: this store to TMP17 replaces the one to TMP11, which is retained but TMP11 is redefined.

Wonder if some simple "sink each ExtractElement to before its single/first user" pass could help confirm all test differences?

Copy link
Collaborator

@ayalz ayalz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, thanks! Adding some final comments.

Comment on lines +2726 to +2734
static inline bool classof(const VPValue *VPV) {
const VPRecipeBase *R = VPV->getDefiningRecipe();
return R && classof(R);
}

static inline bool classof(const VPSingleDefRecipe *R) {
return classof(static_cast<const VPRecipeBase *>(R));
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, related to the introduction of unpack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed for this change, due to the new cast/dyn_cast/isa with VPSingleDefRecipe

Comment on lines +3809 to +3811
// TODO: The Defs skipped here may or may not be vector values.
// Introduce Unpacks, and remove them later, if they are guaranteed to
// produce scalar values.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be placed above - next to "Create explicit...", to also capture skipping VPInstructions?

Suggested change
// TODO: The Defs skipped here may or may not be vector values.
// Introduce Unpacks, and remove them later, if they are guaranteed to
// produce scalar values.
// Current implementation is conservative - it may miss some cases that may
// or may not be vector values. TODO: introduce Unpacks speculatively - remove
// them later if they are known to operate on scalar values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved and adjusted, thanks

if (vputils::isSingleScalar(Def) || vputils::onlyFirstLaneUsed(Def))
continue;

// At the moment, we only create unpacks for scalar users outside
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// At the moment, we only create unpacks for scalar users outside
// At the moment, we create unpacks only for scalar users outside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks

Comment on lines 3816 to 3817
// replicate regions. Recipes inside replicate regions still manually
// extract the required lanes.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// replicate regions. Recipes inside replicate regions still manually
// extract the required lanes.
// replicate regions. Recipes inside replicate regions still extract the
// required lanes implicitly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated thanks

Comment on lines 3818 to 3819
// TODO: Remove once replicate regions are
// unrolled completely.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// TODO: Remove once replicate regions are
// unrolled completely.
// TODO: Remove once replicate regions are unrolled completely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done thanks

Comment on lines 328 to 330
/// Add explicit Build[Struct]Vector recipes that combine multiple scalar
/// values into single vectors.
static void materializeBuildVectors(VPlan &Plan);
/// values into single vectors and Unpack recipes to extract scalars from a
/// vector as needed.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  /// Add explicit Build[Struct]Vector recipes to Pack multiple scalar values
  /// into vectors and Unpack recipes to extract scalars from vectors as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated, thanks!

Comment on lines 1067 to 1071
/// Extracts all lanes from its (non-scalable) vector operand. This is an
/// abstract VPInstruction whose single defined VPValue represents VF
/// scalars extracted from a vector, to be replaced by VF ExtractElement
/// VPInstructions.
Unpack,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to place next to BuildStructVector and BuildVector "Pack" cases - which btw are concrete rather than abstract. (An analogous concrete Unpack which defines VF distinct VPValues could be introduced only after replicating by VF.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved, thanks

(isa<VPInstruction>(&R) &&
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes()))
!cast<VPInstruction>(&R)->doesGeneratePerAllLanes() &&
cast<VPInstruction>(&R)->getOpcode() != VPInstruction::Unpack))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should Unpack indicate that it doesGeneratePerAllLanes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may not generate for all lanes at the moment, if only its first lane is used and all cases need to be handled by unrolling at the moment.

; CHECK-NEXT: store ptr null, ptr [[TMP10]], align 8
; CHECK-NEXT: [[TMP11:%.*]] = extractelement <2 x ptr> [[TMP3]], i32 1
; CHECK-NEXT: store ptr null, ptr [[TMP11]], align 8
; CHECK-NEXT: store ptr null, ptr [[TMP17]], align 8
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok: this store to TMP17 replaces the one to TMP11, which is retained but TMP11 is redefined.

Wonder if some simple "sink each ExtractElement to before its single/first user" pass could help confirm all test differences?

@fhahn fhahn force-pushed the vplan-explicit-unpack branch from 675b4ec to 133f50a Compare October 11, 2025 20:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants